Don't reopen file for every seek if we don't have to. Search directionally for the correct file #1117

emersonknapp · 2022-10-06T23:39:02Z

Use case: rqt_bag uses frequent seek (for every message). This might not be the best way to use rosbag, but regardless it highlights a big performance issue with seek - that we reopen the storage file every single time. Change this to directionally look for a file that contains the requested timestamp, then seek within that bag. This makes it so that repeated seeks within the same file keep that file open.

Discovered in testing for ros-visualization/rqt_bag#126

MichaelOrlov · 2022-10-07T05:54:06Z

@emersonknapp I've just did analysis for this problem yesterday when trying to figure out why we have strange warning in player.

[WARN] [rosbag2_storage]: No storage plugin found with id 'sqlite3'.

I came to conclusion that in

rosbag2/rosbag2_cpp/src/rosbag2_cpp/readers/sequential_reader.cpp

Line 237 in e26743e

storage_ = storage_factory_->open_read_only(storage_options_);

We need to add API to be able to open another file in current storage instead of recreating storage. Recreating storage is expensive. It is reload plugin in memory. And we call load_current_file() for each seek(timestamp) call.

Note: I've had a chance to review your changes yet.

jhdcs

These changes look good to me. I agree on removing the misleading warning - best not to confuse users by warning about normal operations.

rosbag2_storage/src/rosbag2_storage/impl/storage_factory_impl.hpp

MichaelOrlov

@emersonknapp Thank you for your contribution and sorry for my late response.
This is certainly improvements comparable to how it was before.
Although I would like to give some "love" to the logic in implementation to make more clear.

I have some concern in regards to the API changes in particular to the has_next_file(bool reverse) and load_next_file(bool reverse).
After some consideration I think it will be better to create separate API for has_previous_file() and load_next_file()
You may argue that current implementation is better align with current implementation since we already have reverse flag on which we rely up on.

From one hand it look like easy and more straight forward in implementation in one place

rosbag2/rosbag2_cpp/src/rosbag2_cpp/readers/sequential_reader.cpp

Lines 146 to 147 in e948e95

    
           if (!current_storage_has_next && has_next_file(read_order_.reverse)) { 
        
             load_next_file(read_order_.reverse);

However from another hand it's not very clear in another place where logic more sophisticated

rosbag2/rosbag2_cpp/src/rosbag2_cpp/readers/sequential_reader.cpp

Lines 213 to 224 in e948e95

    
           if (timestamp < start_time && has_next_file(true)) { 
        
             // Check back a file if the timestamp is before the beginning of the current file 
        
             load_next_file(true); 
        
             return seek(timestamp); 
        
           } else if (timestamp > end_time && has_next_file(false)) { 
        
             // Check forward a file if the timestamp is after the end of the current file 
        
             load_next_file(false); 
        
             return seek(timestamp); 
        
           } else { 
        
             // The timestamp lies in the range of this file, or there are no files left to go to 
        
             storage_->seek(timestamp); 
        
           }

IMO it will be much cleaner to understand what is going on in logic if API will be with self described names and granuraly for instance:

if (timestamp < start_time && has_prev_file()) {
    // Check back a file if the timestamp is before the beginning of the current file
    load_prev_file(true);
    return seek(timestamp);
  } else if (timestamp > end_time && has_next_file()) {
    // Check forward a file if the timestamp is after the end of the current file
    load_next_file();
    return seek(timestamp);
  } else {
    // The timestamp lies in the range of this file, or there are no files left to go to
    storage_->seek(timestamp);
  }

When someone will try to comprehend this logic and will see construction like has_next_file(true) it's not really obvious that this method call really checking for existence of the previous file rather than next. And it will cause a lot of confusion and will have to keep in mind that next file is not really the next but previous even if name it as next. It's more error prone approach.

Also you can argue that it's made for the sake of the reducing code duplication in "next_/previous_" methods. I was trying to consider this argument as well and it looks like this not really the case. It will be a duplication of a few lines of code in load_next_file() and load_previous_file()

rosbag2/rosbag2_cpp/src/rosbag2_cpp/readers/sequential_reader.cpp

Lines 259 to 269 in e948e95

    
           assert(current_file_iterator_ != file_paths_.end()); 
        
           auto info = std::make_shared<bag_events::BagSplitInfo>(); 
        
           info->closed_file = get_current_file(); 
        
           if (reverse) { 
        
             current_file_iterator_--; 
        
           } else { 
        
             current_file_iterator_++; 
        
           } 
        
           info->opened_file = get_current_file(); 
        
           load_current_file(); 
        
           callback_manager_.execute_callbacks(bag_events::BagEvent::READ_SPLIT, info);

and logic there pretty much straight forward and will became eve more simple i.e. without any branching like if-else if we will duplicate it in relevant methods.

I also was thinking about performance impact for calling get_metadata from storage for each seek operation.

rosbag2/rosbag2_cpp/src/rosbag2_cpp/readers/sequential_reader.cpp

Line 209 in e948e95

auto metadata = storage_->get_metadata();

In current implementation at least in sqlite3 storage plugin this is pretty much expensive call.
We do request to the database for pulling out information about all topics and counting all messages in essence internally in db engine it will translates to iterating over each message give results from such SQL request.
I think we can optimize this place if we will readout and calculate values for metadata only once when we are opening storage and keep metadata in private member variable ready to be returned when needed by request from get_metadata() method.
Although there are one caveat in such approach. We need to take in to account situation when get metadata could be called for storage which was opened with read_write or append io_flag.
We need to readout metadata only if io_flag == READ_ONLY.
We can built in somewhere here

rosbag2/rosbag2_storage_sqlite3/src/rosbag2_storage_sqlite3/sqlite_storage.cpp

Lines 195 to 197 in e948e95

    
           if (is_read_write(io_flag)) { 
        
             initialize(); 
        
           }

and will need to keep this io_flag as private member variable to be able to check it in get_metadata() to be able to decide if we need to do a fresh metadata readout or we can return just cached value saved in private member variable during storage opening.

rosbag2_cpp/src/rosbag2_cpp/readers/sequential_reader.cpp

emersonknapp · 2022-10-12T22:06:22Z

I'm fine with has_prev_file and load_prev_file API instead of parameter. I'll make that change

…nally for the correct file Signed-off-by: Emerson Knapp <[email protected]>

Signed-off-by: Emerson Knapp <[email protected]>

emersonknapp · 2022-11-02T00:50:19Z

@MichaelOrlov updated based on review - re-review requested (sorry for slow turnaround, didn't work during japan roscon trip)

MichaelOrlov

@emersonknapp Looks good enough to be merged with green CI.

Although it could be even better if address performance impact for calling get_metadata from storage for each seek operation.

rosbag2/rosbag2_cpp/src/rosbag2_cpp/readers/sequential_reader.cpp

Line 209 in e948e95

auto metadata = storage_->get_metadata();

In current implementation at least in sqlite3 storage plugin this is pretty much expensive call.
We do request to the database for pulling out information about all topics and counting all messages in essence internally in db engine it will translates to iterating over each message give results from such SQL request.
I think we can optimize this place if we will readout and calculate values for metadata only once when we are opening storage and keep metadata in private member variable ready to be returned when needed by request from get_metadata() method.
Although there are one caveat in such approach. We need to take in to account situation when get metadata could be called for storage which was opened with read_write or append io_flag.
We need to readout metadata only if io_flag == READ_ONLY.
We can built in somewhere here

rosbag2/rosbag2_storage_sqlite3/src/rosbag2_storage_sqlite3/sqlite_storage.cpp

Lines 195 to 197 in e948e95

    
           if (is_read_write(io_flag)) { 
        
             initialize(); 
        
           }

and will need to keep this io_flag as private member variable to be able to check it in get_metadata() to be able to decide if we need to do a fresh metadata readout or we can return just cached value saved in private member variable during storage opening.

@emersonknapp Thoughts? May be implement in follow up issue?

…directionally for the correct file (#1117)" This reverts commit 0c7c352.

MichaelOrlov · 2022-11-07T20:14:43Z

Gist: https://gist.githubusercontent.com/MichaelOrlov/310884637693cd81b243857191413684/raw/cb6b5965073a9c3c99085a1d4f69258479aa7998/ros2.repos
BUILD args: --packages-above-and-dependencies rosbag2_cpp rosbag2_storage rosbag2_tests rosbag2_transport
TEST args: --packages-above rosbag2_cpp rosbag2_storage rosbag2_tests rosbag2_transport
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/11087

Linux
Linux-aarch64
Windows

MichaelOrlov · 2022-11-07T20:17:10Z

@emersonknapp Ups I occasionally merged this PR without running CI.
I've restored branch and trying to run CI.
I hope we haven't broke anything.

If CI will fail most probably will go ahead with Revert "Don't reopen file for every seek if we don't have to. Search … #1151

MichaelOrlov · 2022-11-07T20:18:09Z

As regards my suggestions about improvements with caching metadata in get_metadata() I have addressed it in #1149

MichaelOrlov · 2022-11-08T01:08:24Z

All fine. I am deleting brunch and closing #1151

emersonknapp · 2022-11-09T23:45:48Z

Thank you for the follow through! I meant to say that yes, I think other changes should be followups to this one, keep PRs small

MichaelOrlov · 2022-12-07T20:32:25Z

@mergify backport humble

…nally for the correct file (#1117) * Don't reopen file for every seek if we don't have to. Search directionally for the correct file Signed-off-by: Emerson Knapp <[email protected]> * Fix use of load_next_file in test Signed-off-by: Emerson Knapp <[email protected]> * has_prev_file and load_prev_file API instead of has_next(bool) Signed-off-by: Emerson Knapp <[email protected]> Signed-off-by: Emerson Knapp <[email protected]> (cherry picked from commit 0c7c352) # Conflicts: # rosbag2_cpp/src/rosbag2_cpp/readers/sequential_reader.cpp

mergify · 2022-12-07T20:33:07Z

backport humble

✅ Backports have been created

#1201 Don't reopen file for every seek if we don't have to. Search directionally for the correct file (backport #1117) has been created for branch humble

emersonknapp requested a review from a team as a code owner October 6, 2022 23:39

emersonknapp requested review from MichaelOrlov and jhdcs and removed request for a team October 6, 2022 23:39

jhdcs approved these changes Oct 7, 2022

View reviewed changes

jhdcs approved these changes Oct 11, 2022

View reviewed changes

MichaelOrlov reviewed Oct 11, 2022

View reviewed changes

rosbag2_storage/src/rosbag2_storage/impl/storage_factory_impl.hpp Show resolved Hide resolved

MichaelOrlov requested changes Oct 12, 2022

View reviewed changes

rosbag2_cpp/src/rosbag2_cpp/readers/sequential_reader.cpp Show resolved Hide resolved

emersonknapp mentioned this pull request Oct 14, 2022

ROS2: Use rosbag2_py Python API, remove all explicit SQLite3 logic ros-visualization/rqt_bag#121

Closed

emersonknapp added 3 commits November 1, 2022 16:05

Don't reopen file for every seek if we don't have to. Search directio…

77b240b

…nally for the correct file Signed-off-by: Emerson Knapp <[email protected]>

Fix use of load_next_file in test

8807c40

Signed-off-by: Emerson Knapp <[email protected]>

has_prev_file and load_prev_file API instead of has_next(bool)

f597525

Signed-off-by: Emerson Knapp <[email protected]>

emersonknapp force-pushed the emersonknapp/seek-no-reopen branch from e948e95 to f597525 Compare November 2, 2022 00:19

emersonknapp requested a review from MichaelOrlov November 2, 2022 00:49

MichaelOrlov approved these changes Nov 3, 2022

View reviewed changes

MichaelOrlov mentioned this pull request Nov 6, 2022

Add update_metadata(BagMetadata) API for storage plugin interface #1149

Merged

MichaelOrlov merged commit 0c7c352 into rolling Nov 7, 2022

delete-merged-branch bot deleted the emersonknapp/seek-no-reopen branch November 7, 2022 20:10

MichaelOrlov restored the emersonknapp/seek-no-reopen branch November 7, 2022 20:10

MichaelOrlov added a commit that referenced this pull request Nov 7, 2022

Revert "Don't reopen file for every seek if we don't have to. Search …

b080099

…directionally for the correct file (#1117)" This reverts commit 0c7c352.

MichaelOrlov mentioned this pull request Nov 7, 2022

Revert "Don't reopen file for every seek if we don't have to. Search … #1151

Closed

mergify bot mentioned this pull request Dec 7, 2022

Draft: [Humble] Don't reopen file for every seek if we don't have to. Search directionally for the correct file (backport #1117) #1201

Closed

MichaelOrlov mentioned this pull request Dec 7, 2022

Warning when using rosbag2 mcap on humble: No storage plugin found with id 'mcap' #1195

Closed

emersonknapp deleted the emersonknapp/seek-no-reopen branch December 14, 2022 07:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't reopen file for every seek if we don't have to. Search directionally for the correct file #1117

Don't reopen file for every seek if we don't have to. Search directionally for the correct file #1117

emersonknapp commented Oct 6, 2022 •

edited

Loading

MichaelOrlov commented Oct 7, 2022

jhdcs left a comment

MichaelOrlov left a comment

emersonknapp commented Oct 12, 2022

emersonknapp commented Nov 2, 2022

MichaelOrlov left a comment

MichaelOrlov commented Nov 7, 2022

MichaelOrlov commented Nov 7, 2022

MichaelOrlov commented Nov 7, 2022

MichaelOrlov commented Nov 8, 2022

emersonknapp commented Nov 9, 2022

MichaelOrlov commented Dec 7, 2022

mergify bot commented Dec 7, 2022

	if (!current_storage_has_next && has_next_file(read_order_.reverse)) {
	load_next_file(read_order_.reverse);

	if (timestamp < start_time && has_next_file(true)) {
	// Check back a file if the timestamp is before the beginning of the current file
	load_next_file(true);
	return seek(timestamp);
	} else if (timestamp > end_time && has_next_file(false)) {
	// Check forward a file if the timestamp is after the end of the current file
	load_next_file(false);
	return seek(timestamp);
	} else {
	// The timestamp lies in the range of this file, or there are no files left to go to
	storage_->seek(timestamp);
	}

	assert(current_file_iterator_ != file_paths_.end());
	auto info = std::make_shared<bag_events::BagSplitInfo>();
	info->closed_file = get_current_file();
	if (reverse) {
	current_file_iterator_--;
	} else {
	current_file_iterator_++;
	}
	info->opened_file = get_current_file();
	load_current_file();
	callback_manager_.execute_callbacks(bag_events::BagEvent::READ_SPLIT, info);

Don't reopen file for every seek if we don't have to. Search directionally for the correct file #1117

Don't reopen file for every seek if we don't have to. Search directionally for the correct file #1117

Conversation

emersonknapp commented Oct 6, 2022 • edited Loading

MichaelOrlov commented Oct 7, 2022

jhdcs left a comment

Choose a reason for hiding this comment

MichaelOrlov left a comment

Choose a reason for hiding this comment

emersonknapp commented Oct 12, 2022

emersonknapp commented Nov 2, 2022

MichaelOrlov left a comment

Choose a reason for hiding this comment

MichaelOrlov commented Nov 7, 2022

MichaelOrlov commented Nov 7, 2022

MichaelOrlov commented Nov 7, 2022

MichaelOrlov commented Nov 8, 2022

emersonknapp commented Nov 9, 2022

MichaelOrlov commented Dec 7, 2022

mergify bot commented Dec 7, 2022

✅ Backports have been created

emersonknapp commented Oct 6, 2022 •

edited

Loading